NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning

Yaras, Can; Chen, Siyi; Wang, Peng; Qu, Qing (March 2025, Second Conference on Parsimony and Learning (CPAL 2025))

Multimodal learning has recently gained significant popularity, demonstrating impressive performance across various zero-shot classification tasks and a range of perceptive and generative applications. Models such as Contrastive Language–Image Pretraining (CLIP) are designed to bridge different modalities, such as images and text, by learning a shared representation space through contrastive learning. Despite their success, the working mechanisms of multimodal learning remain poorly understood. Notably, these models often exhibit a \emph{modality gap}, where different modalities occupy distinct regions within the shared representation space. In this work, we conduct an in-depth analysis of the emergence of modality gap by characterizing the gradient flow learning dynamics. Specifically, we identify the critical roles of mismatched data pairs and a learnable temperature parameter in causing and perpetuating the modality gap during training. Furthermore, our theoretical insights are validated through experiments on practical CLIP models. These findings provide principled guidance for mitigating the modality gap, including strategies such as appropriate temperature scheduling and modality swapping. Additionally, we demonstrate that closing the modality gap leads to improved performance on tasks such as image-text retrieval.
more » « less
Full Text Available
Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning

Yaras, Can; Chen, Siyi; Wang, Peng; Qu, Qing (March 2025, The Second Conference on Parsimony and Learning)

Full Text Available
Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation

Yaras, Can; Wang, Peng; Balzano, Laura; Qu, Qing (June 2024, International Conference on Machine Learning)

While overparameterization in machine learning models offers great benefits in terms of optimization and generalization, it also leads to increased computational requirements as model sizes grow. In this work, we show that by leveraging the inherent low-dimensional structures of data and compressible dynamics within the model parameters, we can reap the benefits of overparameterization without the computational burdens. In practice, we demonstrate the effectiveness of this approach for deep low-rank matrix completion as well as fine-tuning language models. Our approach is grounded in theoretical findings for deep overparameterized low-rank matrix recovery, where we show that the learning dynamics of each weight matrix are confined to an invariant low-dimensional subspace. Consequently, we can construct and train compact, highly compressed factorizations possessing the same benefits as their overparameterized counterparts. In the context of deep matrix completion, our technique substantially improves training efficiency while retaining the advantages of overparameterization. For language model fine-tuning, we propose a method called "Deep LoRA", which improves the existing low-rank adaptation (LoRA) technique, leading to reduced overfitting and a simplified hyperparameter setup, while maintaining comparable efficiency. We validate the effectiveness of Deep LoRA on natural language tasks, particularly when fine-tuning with limited data.
more » « less
Full Text Available
Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation

Yaras, Can; Wang, Peng; Balzano, Laura; Qu, Qing (May 2024, Proceedings of the 41 st International Conference on Machine Learning)

Full Text Available
Invariant Low-Dimensional Subspaces in Gradient Descent for Learning Deep Matrix Factorizations

Yaras, Can; Wang, Peng; Hu, Wei; Zhu, Zhihui; Balzano, Laura; Qu, Qing (November 2023, NeurIPS 2023 Workshop M3L)

An extensively studied phenomenon of the past few years in training deep networks is the implicit bias of gradient descent towards parsimonious solutions. In this work, we further investigate this phenomenon by narrowing our focus to deep matrix factorization, where we reveal surprising low-dimensional structures in the learning dynamics when the target matrix is low-rank. Specifically, we show that the evolution of gradient descent starting from arbitrary orthogonal initialization only affects a minimal portion of singular vector spaces across all weight matrices. In other words, the learning process happens only within a small invariant subspace of each weight matrix, despite the fact that all parameters are updated throughout training. From this, we provide rigorous justification for low-rank training in a specific, yet practical setting. In particular, we demonstrate that we can construct compressed factorizations that are equivalent to full-width, deep factorizations throughout training for solving low-rank matrix completion problems efficiently.
more » « less
Full Text Available
Miniaturizing a Chip-Scale Spectrometer Using Local Strain Engineering and Total-Variation Regularized Reconstruction

https://doi.org/10.1021/acs.nanolett.2c02654

Sarwar, Tuba; Yaras, Can; Li, Xiang; Qu, Qing; Ku, Pei-Cheng (October 2022, Nano Letters)

Full Text Available
Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold

Yaras, Can; Wang, Peng; Zhu, Zhihui; Balzano, Laura; Qu, Qing (January 2022, Advances in neural information processing systems)

When training overparameterized deep networks for classification tasks, it has been widely observed that the learned features exhibit a so-called “neural collapse” phenomenon. More specifically, for the output features of the penultimate layer, for each class the within-class features converge to their means, and the means of different classes exhibit a certain tight frame structure, which is also aligned with the last layer’s classifier. As feature normalization in the last layer becomes a common practice in modern representation learning, in this work we theoretically justify the neural collapse phenomenon under normalized features. Based on an un-constrained feature model, we simplify the empirical loss function in a multi-class classification task into a nonconvex optimization problem over the Riemannian manifold by constraining all features and classifiers over the sphere. In this context, we analyze the nonconvex landscape of the Riemannian optimization problem over the product of spheres, showing a benign global landscape in the sense that the only global minimizers are the neural collapse solutions while all other critical points are strict saddle points with negative curvature. Experimental results on practical deep networks corroborate our theory and demonstrate that better representations can be learned faster via feature normalization. Code for our experiments can be found at https://github.com/cjyaras/normalized-neural-collapse.
more » « less
Full Text Available
Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold

Yaras, Can; Wang, Peng; Zhu, Zhihui; Balzano, Laura; Qu, Qing (January 2022, Advances in neural information processing systems)

Full Text Available
Neural collapse with normalized features: A geometric analysis over the riemannian manifold

Yaras, Can; Wang, Peng; Zhu, Zhihui; Balzano, Laura; Qu, Qing (January 2022, Advances in Neural Information Processing Systems)

Full Text Available

Search for: All records